Student Team: Yes
Did you use data from both
mini-challenges? No
Approximately
how many hours were spent working on this submission in total? ±100 hours
May we post
your submission in the Visual Analytics Benchmark Repository after VAST
Challenge 2015 is complete? Yes
Video Download
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC1.1 – Characterize the attendance at DinoFun World
on this weekend. Describe up to twelve different types of groups at the park on
this weekend.
a.
How big is this type
of group?
b.
Where does this type
of group like to go in the park?
c.
How common is this
type of group?
d.
What are your other
observations about this type of group?
e.
What can you infer
about this type of group?
f.
If you were to make
one improvement to the park to better meet this group’s needs, what would it
be?
Limit your response to no
more than 12 images and 1000 words.
First,
we aggregated each day’s data at the visitor level, summing up the total number
of check-in by Ride Type and Location (how the Ride Type and Location are
derived is explained in the answer to Question 2). We also captured the time of
arrival and departure so as to derive the time spent per day at the theme park.
We then merged the 3 days of data into 1 dataset, and introduced a new column
“Repeat Visitors” to indicate whether the visitor visited only on Fri, Sat or
Sun, or “Fri and Sat”, “Sat and Sun”, and “Fri, Sat and Sun”. The total number
of check-ins by Ride Type were calculated by summing up the corresponding
figures over the 3 days. Similarly, the ratio of the total number of check-ins
by Ride Type was derived by dividing the absolute figures by the total number
of check-in.
We
used hierarchical clustering in JMP and make use of the dendrogram
and heatmap generated from the clustering results to
identify the distinct features of each cluster. We also used the slider
available in JMP where we can vary the number of clusters generated. After
which, we made use of the coordinated link view where we can select a specific
cluster and look at the histogram of the various variables in order to identify
the distinct features of each cluster.
We
approached the write-up in this manner: Identify general patterns in a big
cluster, before we zoomed into a subset of the big cluster so as to identify
more unique features.
Results of Hierarchical Clustering
Big
Group 1
Fig 1-1. Hierarchical Cluster of
Big Group 1
a.
The group size ranges from 1 to 10.
Fig 1-2. Group sizes found in Big
Group 1
b.
They have preference for Trill Rides and they take Rides for Everyone,
but not as many times as Thrill Rides. They don’t take Kiddie Rides.
Fig 1-3. Distribution of Total
check-in over 3 days (breakdown by Ride Type)
c.
They have 793 such groups.
d.
Most of them check in the morning.
e.
They are Adrenaline Seekers.
f.
To make an improvement, the theme park can offer the adrenaline seekers the priority
to go Rides for Everyone or kiddie Rides if they take
more than 5 times Thrill Rides. In doing this way, they can spread out the
traffic.
Group
1a
Fig 1a-1. Hierarchical Cluster of
Group 1
Fig 1a-2. Group sizes found in
Group 1
Fig 1a-3. Distribution of Total
check-in over 3 days (breakdown by Ride Type)
Fig 1a-4. Other observations about
Group 1
Group
1b
Fig 1b-1. Hierarchical Cluster of
Group 1b
Fig 1b-2. Group sizes found in
Group 1b
Fig 1b-3. Distribution of Total check-in
over 3 days (breakdown by Ride Type)
Fig 1b-4. Other observations about
Group 1b
Group
1c
Fig 1c-1. Hierarchical Cluster of
Group 1c
Fig 1c-2. Group sizes found in
Group 1c
Fig 1c-3. Distribution of Total check-in
over 3 days (breakdown by Ride Type)
Fig 1c-4. Other observations about
Group 1c
Big
Group 2
a.
The group size ranges from 1 to 42.
b.
They prefer Trill Rides and take Rides for Everyone and Kiddie Rides as well.
And they like to go shows compared with Big Group 1.
c.
They have 513 such groups.
d.
Most of them check in this the morning and spend one day in the park; the total
check-in times are around 18.
e.
They are probably Family groups
f.
If you were to make one improvement to the park to better meet this group’s
needs, what would it be?
Group
2a
a.
The group size ranges from 1 to 11, with most of them having a group size less
than 5.
b.
Group 4 has an equal preference for both Thrill Rides and Rides for Everyone,
and a slightly lower preference for Kiddie Rides. The total rides by ride type
(absolute) are on the lower range however.
c.
There are 35 such groups.
d.
Most of them spend 1 day in the theme park, and arrived in the morning.
e.
These could the “Nuclear Family” group with 2 adults with young children. They are not really active as their total number of
check-in is on the lower range compared to other groups.
f.
The park could offer bundled tickets with 2 adult ticket and 2 child tickets,
which is cheaper than buying them individually.
Big
Group 3
a.
The group size ranges from 1 to 44.
b.
They prefer to go for Thrill Rides and have the same preference to Rides for
Everyone and Kiddie Rides.
c.
There are 510 such groups.
d.
Most of them check in in the morning and spend one day in the theme park.
e.
What can you infer about the group?
f.
If you were to make one improvement to the park to better meet this group’s
needs, what would it be?
Group 3a
Fig 3a-1. Hierarchical Cluster of
Group 3a
Fig 3a-2. Group sizes found in
Group 3a
Fig 3a-3. Distribution of Total check-in
over 3 days (breakdown by Ride Type)
Fig 3a-4. Other observations about
Group 7
MC1.2 – Are there notable differences in the patterns of activity on in the park
across the three days? Please describe
the notable difference you see.
Limit your response to no more than 3 images and 300 words.
The 3 days of data were concatenated into 1 dataset, so that we
can check for different patterns across the 3 days. The horizontal axis shows the
timestamp, grouped by the day of week, and the hour of day. The vertical axis
shows the distinct count of visitors. We used the filter to exclude movement
type records. The X,Y coordinates were concatenated
together so that we can match each check-in record to a specific Ride Type and
the Location. We also made use of the Quick Filter in Tableau so that we can
check the patterns pertaining to each Ride Type.
On Fri and Sat, we can see that there are 2 shows at Grinosaurus Stage, 10am and 3pm. However, on Sun (8 Jun),
there was only 1 show at 10am. From this, we can deduce that the crime happened
on Sun between 9am to 10am. This will be investigated in detail in the answer
to Question 3.
Another observation is that there are no visitors at Creighton
Pavilion at 10am and 3pm, as the Pavilion was closed when the show starts at Grinosaurus Stage.
Looking at the Thrill Rides across the 3 days, all the Thrill
Rides follow a similar pattern across the 3 days, except for TerrorSaur ride. We can see that TerrorSaur
has a similar pattern as the rest of the Thrill Rides on Friday. However, it
follows a different pattern on Sat and Sun, where the visitor count was much
lower compared to the other rides. We observed a plateau in visitor count for TerrorSaur ride on Sat and Sun between 10am to the evening
time (approx. 5pm), compared to Fri data. As we have a high number of repeat
visitors (approximately 4 in 10 are repeat visitors), we deduce that TerrorSaur follows a similar pattern as the other rides on
Friday, as the visitors were trying out all the rides. However, on Sat and Sun,
these repeat visitors opt for the rides they find more interesting, and they
did not find TerrorSaur ride very interesting. Hence
Terror Saur ridership was much lower on Sat and Sun compared to the other
rides.
The above image shows that there is a spike in visitors seeking information and assistance at 9am and 1pm, corresponding to the usual peak at opening time and showtime at 3pm respectively. This is probably due to the visitors asking for directions on how to get to the Grinosaurus Stage and what time does the show start. The spike is worse on Friday 1pm, as this is the first day of the show.
MC1.3 – What anomalies or unusual patterns do you see? Describe no more than 10 anomalies,
and prioritize those unusual patterns that you think are most likely to be
relevant to the crime.
Limit your response to no more than 10 images and 500 words.
The
data plotted by Visitor’s ID activities in the park in a given time period. The
‘movement’ type data indicated by small blue circles, while the ‘check-in’
types are using a bigger shapes to distinguish it easily. Only the Entrance,
Creighton Pavilion, and Grinosaurus Stage shapes are
different, as they are the point-of-interest in the investigation. The 3
Entrances are in ‘triangle’, the Creighton Pavilion is in ‘cross’, and Grinosaurus Stage is in ‘asterisk’. When there’s a blank
space between activities, it means the visitor is inside the ride or is
stationery until the next movement type is recorded.
By
analysing these graph, we have found 5 anomalies that might related to the
crime and/or possible issues in the theme park.
Another
list is for the same case of device malfunctioning, but they never came to the
Pavilion, so they are in the safe list. Even so, the case of malfunctioning
devices are a significant issues that need to be fixed by the theme park.